Data Archaeology
   HOME

TheInfoList



OR:

There are two conceptualisations of data archaeology, the technical definition and the social science definition. Data archaeology (also data archeology) in the technical sense refers to the art and science of recovering computer
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
encoded In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
and/or
encrypted In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can deci ...
in now obsolete
media Media may refer to: Communication * Media (communication), tools used to deliver information or data ** Advertising media, various media, content, buying and placement for advertising ** Broadcast media, communications delivered over mass e ...
or formats. Data archaeology can also refer to recovering information from damaged
electronic Electronic may refer to: *Electronics, the science of how to control electric energy in semiconductor * ''Electronics'' (magazine), a defunct American trade journal *Electronic storage, the storage of data using an electronic device *Electronic co ...
formats after natural disasters or human error. It entails the rescue and recovery of old data trapped in outdated, archaic or obsolete storage formats such as floppy disks, magnetic tape, punch cards and transforming/transferring that data to more usable formats. Data archaeology in the social sciences usually involves an investigation into the source and history of datasets and the construction of these datasets. It involves mapping out the entire lineage of data, its nature and characteristics, its quality and veracity and how these affect the analysis and interpretation of the dataset. The findings of performing data archaeology affect the level to which the conclusions parsed from data analysis can be trusted. The term data archaeology originally appeared in 1993 as part of the Global Oceanographic Data Archaeology and Rescue Project (GODAR). The original impetus for data archaeology came from the need to recover computerised records of climatic conditions stored on old computer tape, which can provide valuable evidence for testing theories of
climate change In common usage, climate change describes global warming—the ongoing increase in global average temperature—and its effects on Earth's climate system. Climate change in a broader sense also includes previous long-term changes to ...
. These approaches allowed the reconstruction of an image of the
Arctic The Arctic ( or ) is a polar region located at the northernmost part of Earth. The Arctic consists of the Arctic Ocean, adjacent seas, and parts of Canada (Yukon, Northwest Territories, Nunavut), Danish Realm (Greenland), Finland, Iceland, N ...
that had been captured by the Nimbus 2
satellite A satellite or artificial satellite is an object intentionally placed into orbit in outer space. Except for passive satellites, most satellites have an electricity generation system for equipment on board, such as solar panels or radioi ...
on September 23, 1966, in higher resolution than ever seen before from this type of data.
NASA The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the US federal government responsible for the civil List of government space agencies, space program ...
also utilises the services of data archaeologists to recover information stored on 1960s-era vintage computer tape, as exemplified by the
Lunar Orbiter Image Recovery Project The Lunar Orbiter Image Recovery Project (LOIRP) is a project to digitize the original analog data tapes from the five Lunar Orbiter spacecraft that were sent to the Moon in 1966 and 1967; it is funded by NASA, SkyCorp, SpaceRef Interactive, an ...
(LOIRP).


Recovery

There is a distinction between data recovery and data intelligibility. One may be able to recover data but not understand it. For data archaeology to be effective, the data must be intelligible.
Study on website October 23, 2011
A term closely related to data archaeology is
data lineage Data lineage includes the data origin, what happens to it, and where it moves over time. Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process. It also enables repl ...
. The first step in performing data archeology is an investigation into their data lineage. Data lineage entails the history of the data, its source and any alterations or transformations they have undergone. Data lineage can be found in the metadata of a dataset, the para data of a dataset or any accompanying identifiers (methodological guides etc). With data archeology comes methodological transparency which is the level to which the data user can access the data history. The level of methodological transparency available determines not only how much can be recovered, but assists in knowing the data. Data lineage investigation involves what instruments were used, what the selection criteria are, the measurement parameters and the sampling frameworks. In the socio-political manner, data archaeology involves the analysis of data assemblages to reveal their discursive and material socio-technical elements and apparatuses. This kind of analysis can reveal the politics of the data being analysed and thus that of their producing institution. Archaeology in this sense, refers to the provenance of data. It involves mapping the sites, formats and infrastructures through which data flows and are altered or transformed over time. it has an interest in the life of data, and the politics that shapes the circulation of data. This serves to expose the key actors, practices and praxes at play and their roles. It can be accomplished in two steps. First is, accessing and assessing the technical stack of the data (this refers to the infrastructure and material technologies used to build/gather the data) to understand the physical representation of the data and also. Second, analysing the contextual stack of the data which shapes how the data is constructed, used and analysed. This can be done via a variety of processes, interviews, analysing technical and policy documents and investigating the effect of the data on a community or the institutional, financial, legal and material framing. This can be attained by creating
data assemblage
Data archaeology charts the way data moves across different sites and can sometimes encounter data friction.


Disaster recovery

Data archaeologists can also use
data recovery In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The dat ...
after natural disasters such as fires, floods, earthquakes, or even
hurricanes A tropical cyclone is a rapidly rotating storm system characterized by a low-pressure center, a closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce heavy rain and squalls. Depend ...
. For example, in 1995 during
Hurricane Marilyn Hurricane Marilyn was the most powerful hurricane to strike the Virgin Islands since Hurricane Hugo of 1989, and the third such tropical cyclone in roughly a two-week time span to strike or impact the Leeward Islands, the others being Hurricane ...
the National Media Lab assisted the
National Archives and Records Administration The National Archives and Records Administration (NARA) is an " independent federal agency of the United States government within the executive branch", charged with the preservation and documentation of government and historical records. It ...
in recovering data at risk due to damaged equipment. The hardware was damaged from rain, salt water, and sand, yet it was possible to clean some of the disks and refit them with new cases thus saving the data within.


Recovery techniques

When deciding whether or not to try and recover data, the cost must be taken into account. If there is enough time and money, most data will be able to be recovered. In the case of
magnetic media Magnetic storage or magnetic recording is the storage of data on a magnetized medium. Magnetic storage uses different patterns of magnetisation in a magnetizable material to store data and is a form of non-volatile memory. The information is ac ...
, which are the most common type used for data storage, there are various techniques that can be used to recover the data depending on the type of damage. Humidity can cause tapes to become unusable as they begin to deteriorate and become sticky. In this case, a heat treatment can be applied to fix this problem, by causing the oils and residues to either be reabsorbed into the tape or evaporate off the surface of the tape. However, this should only be done in order to provide access to the data so it can be extracted and copied to a medium that is more stable. Lubrication loss is another source of damage to tapes. This is most commonly caused by heavy use, but can also be a result of improper storage or natural evaporation. As a result of heavy use, some of the lubricant can remain on the read-write heads which then collect dust and particles. This can cause damage to the tape. Loss of lubrication can be addressed by re-lubricating the tapes. This should be done cautiously, as excessive re-lubrication can cause tape slippage, which in turn can lead to media being misread and the loss of data. Water exposure will damage tapes over time. This often occurs in a disaster situation. If the media is in salty or dirty water, it should be rinsed in fresh water. The process of cleaning, rinsing, and drying wet tapes should be done at room temperature in order to prevent heat damage. Older tapes should be recovered prior to newer tapes, as they are more susceptible to water damage. The next step (after investigating the data lineage) is to establish what counts as good data and bad data to ensure that only the 'good' data gets migrated to the new data warehouse or repository. A good example of bad data is 'test data' in the technical data sense is
test data Test data is data which has been specifically identified for use in tests, typically of a computer program. Background Some data may be used in a confirmatory way, typically to verify that a given set of input to a given function produces some e ...
.


Prevention

To prevent the need of data archaeology, creators and holders of digital documents should take care to employ
digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and ...
. Another effective preventive measure is the use of offshore backup facilities that could not be affected should a disaster occur. From these backup servers, copies of the lost data could easily be retrieved. A multi-site and multi-technique data distribution plan is advised for optimal data recovery, especially when dealing with big data.
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
method, snapshot recovery, mirror sites and tapes safeguarding data in a private cloud are also all good preventive methods. Daily transferring data from their mirror sites to the emergency servers.


See also

* Bit rot *
Data curation Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for re ...
*
Data preservation Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging th ...
*
Digital dark age The digital dark age is a lack of historical information in the digital age as a direct result of outdated file formats, software, or hardware that becomes corrupt, scarce, or inaccessible as technologies evolve and data decay. Future generation ...
*
Digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and ...
*
Knowledge discovery Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must r ...


References


World Wide Words: Data Archaeology
*O'Donnell, James Joseph. ''Avatars of the Word: From Papyrus to Cyperspace'' Harvard University Press, 1998. * * Kitchin, Rob. (2022.) The Data Revolution: Second Edition. Sage Publications. * Dumit, J. and Nafus, D. (2018) ‘The other ninety per cent: Thinking with data science, creating data studies,’ in Knox, H. and Nafus, D. (eds), Ethnography for a Data-Saturated World. Manchester University Press, Manchester, pp. 252–274 * Chang, V. (2015). 'Towards a Big Data system disaster recovery in a Private Cloud.' ''Ad Hoc Networks,'' vol 5, pp. 65-82. Elsevier. * “Bates, J., Lin, Y.-W. and Goodale, P. (2016) ‘Data journeys: Capturing the socio-material constitution of data objects and flows’, Big Data & Society, 4(2): 1–12.”. {{Data
Archaeology Archaeology or archeology is the scientific study of human activity through the recovery and analysis of material culture. The archaeological record consists of artifacts, architecture, biofacts or ecofacts, sites, and cultural landsca ...
Digital preservation Archaeological sub-disciplines